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Nonparametric model reconstruction for stochastic differential equation from 

discretely observed time-series data 

Jun OhkubcEl 

Graduate School of Informatics, Kyoto Umversity, 
Yoshida Hon-machi, Sakyo-ku, Kyoto-shi, Kyoto 606-8501, Japan 

A scheme is developed for estimating state-dependent drift and diffusion coefficients in a stochastic 
differential equation from time-series data. The scheme does not require to specify parametric forms 
for the drift and diffusion coefficients in advance, fn order to perform the nonparametric estimation, 
a maximum likelihood method is combined with a concept based on a kernel density estimation. 
In order to deal with discrete observation or sparsity of the time-series data, a local linearization 
method is employed, which enables a fast estimation. 



I. INTRODUCTION 

Recently, it has been clarified that stochastic nature 
in small systems such as cells plays an important role 
in dynamics and behavior of biological systems I1h3| . In 
addition, due to recent experimental developments such 
as single-molecule spectroscopy, it becomes possible to 
obtain a time-series data for various stochastic phenom- 
ena. From a theoretical point of view, it is important to 
develop methods for analysis of the time-series data, and 
actually there are many studies for the analysis of the 
single- molecule time-series (for example, see Ref. 0-01 )• 

Parameter estimations from observed time-series data 
are also important research topics. If one obtains an ex- 
perimental data for a specified biochemical system, pa- 
rameters in the specified biochemical system can be es- 
timated from the experimental data. The biochemical 
system could be modelled by using a master equation 
or a stochastic differential equation (a Langevin equa- 
tion). The master equation or the stochastic differential 
equation for the specified biochemical system has some 
parameters (e.g., reaction rates). If the reaction rates 
are estimated from the experimental data, we will ob- 
tain a reconstructed model, which would reproduce the 
experimental data adequately. The reconstructed model 
enables us to perform more detailed numerical simula- 
tions and to have deep insights for the phenomenon. In 
recent years, a discrete property or sparsity in observa- 
tions has attracted much attention; it would be difficult 
to completely observe the phenomenon and to obtain 
detailed time-series data, and then we would perform 
the estimation from discretely observed time-series data. 
For example, estimation procedures based on Markov 
Chain Monte Carlo methods [^-Q and variational meth- 
ods [lO, [m have been proposed for the problem of the 
discrete observations. (In addition, there is a recent re- 
view article [IJ] for the estimation problem.) 

Here, we consider the following situation: We know 
that a time-series data can be modelled with a stochastic 
differential equation, but specific forms of drift and diffu- 
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sion coefficients of the stochastic differential equation are 
unknown in advance. That is, there is no prior knowledge 
about a time-series data, except for some basic proper- 
ties such as a memoryless property. While there are some 
works for the parametric estimation based on a maximum 
likelihood method [13, |lJ| , simple applications of these 
parametric estimations are not suitable for our problem 
here, and a nonparametric estimation scheme is needed. 
For example, we here focus on a bimodal distribution of a 
chemical substance. Relations between the biomodal dis- 
tributions and stochasticity have been discussed experi- 
mentally [l5|, [lal and theoretically [IJ, Il3 ■ Although one 
may consider that the bimodal distribution is produced 
from a double-well potential system, it has been known 
that the bimodal distribution can also be produced from 
a state-dependent noise [13, 1201 ■ I'^ addition, a recent 
study indicates that such noise-induced bimodality may 
play an important role in a decision making in a noisy en- 
vironment 21, 22i]. In these situations, it is necessary to 
judge whether a bimodal distribution is produced from a 
double-well potential system or a state-dependent noise, 
and it is enough to estimate how the drift and diffusion 
coefficients of the stochastic differential equation depend 
on the state. For the nonparametric estimations, there 
are mariy studies in various research fields. For example, 
in Ref. [23'| , a method based on estimations of Kramers- 
Moyal coefficients has been proposed. However, in the 
Kramers-Moyal coefficients estimation, adjoint Fokker- 
Planck equations should be solved numerically, which 
needs additional computational costs. In order to per- 
form nonparametric estimations for complicated systems, 
fast algorithms are required. 

In the present paper, we develop a nonparametric 
model-reconstruction method for a stochastic differential 
equation from a discretely observed time-series data. A 
kind of local estimations is employed in order to extract 
the state-dependency in the nonparametric estimation. 
In order to perform the local estimations efficiently, we 
propose a maximum likelihood method combined with 
a concept based on a kernel density estimation, which 
has been studied a lot and widely used for nonparamet- 
ric density estimation [2J]. The difficulty caused from 
the discrete observations is dealt with a local lineariza- 
tion method [2^, [2g|, which enables us to approximate 
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FIG. 1: (a) Observed time-series data. There are totally 
2000 points observed discretely, (b) Enlarged figure of (a). 
Each circle is an observation with At — 0.05, and the solid 
thin line is an original path which is produced using the Euler- 
Maruyama approximation with At = 0.001. 



a nonlinear stochastic differential equation by a locally 
linear stochastic differential equation. In addition, a use- 
ful 'second-order' form suitable for the local linearization 
method is proposed. We demonstrate that the combi- 
nation of these ideas enables us to estimate the state- 
dependent drift and diffusion coefficients from only a 
small set of discretely observed time-series data. 

The present paper is constructed as follows. In Sec. II, 
we give problem settings and an example of a time-series 
data. In Sec. Ill, we briefly review a kernel density es- 
timation, and the scheme is reformulated from a differ- 
ent point of view; we show that a maximum likelihood 
method reproduces the kernel density estimation ade- 
quately. Section IV is the main part in the present pa- 
per; an explicit estimation scheme based on the local lin- 
earization method is explained. Examples of estimation 
results for the problem introduced in Sec. II are given in 
Sec. V. Section VI is the conclusion. 



II. PROBLEM SETTINGS 

The aim of our estimation problem is to reconstruct a 
stochastic differential equation only from observed time- 
series data. In order to demonstrate the estimation, 
we here use the following Ito-type stochastic differential 
equation as a toy model: 



dxt = f''"'{xt)dt + g''^%Xt)dWt, 



(1) 



where Wt is a Wiener process, /*'^"°(a;) and g*^'^^°{x) are 
state-dependent drift and state-dependent diffusion coef- 
ficients defined as 

/t™<=(a;) = -4x^ + Ax, 5'™°(a;) - 0.2 sin(7ra;), (2) 

respectively. This model has state-dependent drift and 
diffusion coefficients. In addition, this form of the 
stochastic differential equation suggests that the poten- 
tial landscape is a double-well form, and there are two 
stable points around x — 1 and x = —1. Using the 
Euler-Maruyama scheme [23|, we generate an 'original' 
path for the stochastic differential equation ([T]). In the 
generation of the original path, we employ a time interval 
of At = 0.001. After the generation of the original path, 
we sampled data points discretely, which corresponds to 
discrete observations. Although the time interval of the 
observation can be varied, we here take data points at 
equally spaced time interval Ai — 0.05 for simplicity. 
The generated time-series data are depicted in Fig. [1] 
Hereafter, we must forget the stochastic differential equa- 
tion IH) and the state-dependency of the drift and diffu- 
sion coefficients in Eq. ([2]), and only the time-series data 
in Fig. [T] is focused and analyzed. 

Our aim is to analyze the time-series data in Fig. [T] 
under a situation that we do not have any information 
about the original model. A first guess may be as follows: 
It seems that there are two stable points around x r^ \.0 
and x ^ —1.0, and the time spent around x ^ —1.0 
seems to be a little longer than that around a; ^ 1.0. Is 
the difference caused by a potential landscape, or other 
reasons? A simple way to solve this problem is to recon- 
struct an explicit model which reproduces the time-series 
data. Hence, the problem settings are as follows: Esti- 
mate f{x) and g{x) in 



dxt = f{xt)dt + g{xt)dWt 



(3) 



from a time-series data {{Xi,Ti)\i = 1,...,A^}, where 
each data point Xi is observed at time 7^, and N is the 
total number of the observed data points. Note that 
there is no prior knowledge about the state-dependencies 
of f{x) and g{x), except that these coefficients are time- 
independent. It could be possible to deal with time- 
dependent cases, but it is beyond the scope of the present 
paper. 



III. KERNEL DENSITY ESTIMATION 

In order to develop a scheme to estimate the drift and 
diffusion coefficients in Eq. ([3]) , we firstly explain a kernel 
density estimation, which gives us many insights for our 
final aim. 



A. Brief revieAV 

A first and simple step to analyze a time-series data is 
to construct a probability density p{x) for the observed 
data. A histogram, in which the x coordinate is split 
into several bins and the numbers of data points within 
the bins are counted, is a simple method to estimate 
the probability density p{x). However, the histogram 
is based on a discrete approximation. One of the most 
widely used method for nonparametric density estima- 
tion is a kernel density estimation [2J, [2^. The ker- 
nel density estimation has been recently studied in the 
context of biophysics [29] , in which applications for the 
forced unfolding and unbinding data for proteins are dis- 
cussed. 

In the kernel density estimation, a non-negative 
real function K{x), i.e., a kernel function, is used. 
The kernel function satisfies the normalization condi- 
tion, J_ K{x)dx = 1, and it has a zero first mo- 
ment, J_ xK(x)dx = 0, and a finite second moment 

j_ x^K(x)dx < cx). In the present paper, a parame- 
ter h, a so-called bandwidth, is introduced explicitly in 
the kernel function, and we write the kernel function as 
Kh{x) = K{x/h)/h. Using the kernel function, the prob- 
ability density is estimated as 



p{x) 



1 



N 



— ^Khix 

i=l 



X, 



(4) 



There are some kernel functions, and in the present 
paper, we consider only a Gaussian kernel Kh{x) = 
(l/-\/27r/i)exp(— a:^/(2ft,^)), which has been widely used. 
The remaining task for the kernel density estimation 
is the choice of the bandwidth h of the kernel function. 
Various choices have been studied, and a famous data- 
driven method is a method based on a cross-validation 
[so']. In the cross-validation method, the following risk 
function is minimized with respect to the bandwidth h 
for the Gaussian kernel: 

Q = A + bY, [exp (-Ay 4) - Cexp (-Ay2)] , (5) 



i<j 



where 



A = {2Nh^/^)'\ B = {N'^h^/^)-\ 

Figure [2] shows the estimated probability density. 
Here, we used h = 0.06476, which is selected based on 







FIG. 2: Probability density function p{x). The solid line 
corresponds to an exact solution of Eq. ([l}. The estimated 
density from the observed time-series data is depicted with the 
dashed line. The bimodality of the density is reproduced, but 
heights of the peaks are different from the exact one because 
of the small size of the data in Fig. [l] 



the risk function ([S|). The solid line corresponds to an ex- 
act solution obtained from Eq. ([TJ . Because of the small 
size of the data in Fig. [1] there are differences between 
the estimated probability density and the exact solution. 
We note that if there is a longer time-series data, better 
estimate results are obtained. 



B. Kernel density estimation through maximum 
likelihood estimation 



The estimation of the probability density is not the aim 
in the present paper, but it is helpful for us to reformulate 
the kernel density estimation from the view point of a 
maximum likelihood estimation; this discussion gives us 
a way to reconstruct a stochastic differential equation 
without any prior knowledge. 

The maximum likelihood estimation enables us to esti- 
mate parameters in a statistical model (for example, see 
Rcf. [31]). In the context of the density estimation, the 
probability density plays a role as the parameters, and 
we seek 'probable' probability density function p{x) from 
observed data. 

Due to the concept of the kernel function, a con- 
tribution from one observed data point should be dis- 
tributed according to the kernel function. For example, 
assume that we observe a data point X = 1.0. Although 
we have only one data point, here we introduce many 
'replicas' for the observation. Each replica has a differ- 
ent 'pseudo-observation'. For instance, when there are 
four replicas and the replicas have pscudo-data-points 
X ~ 1.0, 1.2,0.7, 1.2, the probability with which we ob- 
serve the total replicas is 

p{X = 1.0)(p(X = 1.2))2p(X = 0.7). 



There are four pseudo-observations for only one real ob- 
servation, and then one may consider the fourth root, 

{p{X = 1.0))i/4(p(X = 1.2))2/4(p(X = 0.7))!/^ 

Note that the frequency of 'pseudo-observation' of X is 
the power index of each probability. Extending this dis- 
cussion and using the kernel function instead of the fre- 
quency of 'pseudo-observation', we construct the 'kernel- 
ized' likelihood function, as follows. Using a discretiza- 
tion of the X coordinate as x = j Ax, a probability with 
which we observe a 'distributed' data point Xi is written 
as 



lim TT p{jAx)'^'' 



ijAx-Xi) 



Aa;->0 



3 = -oo 



Note that if we use a delta-like function as the kernel 
function, only a contribution from p{Xi) remains, and an 
usual interpretation without the kernel function is recov- 
ered. Using a notation X ~ {Xi, . . . , Xn}, a likelihood 
function L{p(x)\X) is written as 



N oo 



L{p{x)\X) = hm n n P(j.Aa;)^-(-'"'^- 



X.) 



(6) 



i— 1 ji— — oo 



where the hat of p means that this is just the parameter 
to be estimated. Hence, the log-likelihood function is 

mx)\X) ^J2 ^^* i^ogpix,)] Kh{x, - X{). (7) 

4 = 1"'-°° 

In order to obtain the maximum likelihood estimates 
for p{x), we take a functional derivative of Eq. ([7]) with 
respect to p{x) under a constraint J dxp{x) = 1. A La- 
grange multiplier A is introduced and we consider a max- 
imization of the following function: 



N 

E 



dxi [\ogp{xi)] Kh{xi - Xi) + A / dxp{x) - 1 



Taking the functional derivative with respect to p{x) and 
setting the functional derivative is equal to zero, we ob- 
tain 



.^ 1 



t[P{x) 



Kh{x-X,) + \^Q, 



and therefore 



1 ^ 
p{x) = -Y,Kn{x-X,). 

i—1 



(8) 



(9) 



Inserting Eq. Q into the constraint condition, we have 



1 1'°° ^ N 

1 = t/ dxy2Kh{x-X,) = -. 



(10) 
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FIG. 3: A schematic illustration of the kernel density esti- 
mate pdilxo). Each data point has a weight which is smoothly 
distributed according to the kernel function. The condition- 
ing on a; = xo is carried out by another kernel function in the 
X coordinate. 



Hence, X = N and we recover Eq. Q. 

The above discussion indicates that the concept of 'dis- 
tributed' data points in 'pseudo-observation' corresponds 
to the kernel function. It would be expected that the 
combination of the 'distributed' data points and the max- 
imum likelihood method gives us more flexible estimation 
scheme than the usual one. 

It is straightforward to extend the above discussions 
to an estimation of a conditional density. Given V = 
{{Xi,Yi), . . . , {Xn, Yn)}, a sample of independent ob- 
servations from the distribution of {X, Y) , we want to 
obtain the estimation of the conditional density p(jj\x). 
In this case, the log-likelihood function should be set as 



my\x)\V) 



^ E / '^y'' \^°SP{y'i\x)] Kw{yi - Yi)Kw{x - X{), 
1=1 •'-°° 

(11) 

where Kw{y) is a kernel function for the y coordinate, 
and Kw{x) is that for the x coordinate. Using the similar 
discussion as Eq. ([7]) , we obtain the following conditional 
density estimator 



p{y\x) 



E^=lKw{x-Xi)K^{y~Yi, 
EliKw{x-X,) 



(12) 



which is the same as a conventional conditional density 
estimator |32|.l33j. 

Figure |3] shows a schematic illustration of the kernel 
conditional density estimator. We here consider a con- 
ditional density p(y|xo). The kernel functions K^ with 



bandwidth w are distributed according to the observa- 
tions. The conditioning x = Xq is carried out by another 
kernel function in the x coordinate. The kernel function 
Kw has a bandwidth W, and the center of the kernel 
function is at x = a:o- The estimation of the conditional 
density is performed by summing the N kernel functions 
in the y coordinate, weighted by the kernel function in 
the X coordinate. 

For {{Xi,Yi), . . . ,{Xn,Yn)} being random samples 
from a population having a density p(x,y), there are 
many studies for estimating conditional densities and 
properties of the conditional densities [3J-[36j . The prob- 
lem in the present paper is different from these studies; 
{{Xi,Yi), . . . , {Xn, Yn)} are not generated from an iden- 
tical density; see the next section. The maximum likeli- 
hood method developed in this section enables us to deal 
with the nonparametric estimation for a stochastic dif- 
ferential equation, and we will propose the method in the 
next section. 



IV. PROPOSED METHOD 
A. Basics 

We are now ready for constructing a method to esti- 
mate drift and diffusion coefficients in a stochastic differ- 
ential equation from a time-series data. 

Firstly, the observed data {{Xi,Ti)\i = 1,...,A^} 
is converted to a slightly different form V — 
{{X,, AX,,AU)\i = 1, . . . , TV - 1}, where AX, = X,+i - 
Xi and Ati = T^+i — T^. This conversion means that 
when a current coordinate is X^, we have an amount of 
change AXi during the time interval Ati. For each i, 
the time interval can be varied in general, and then the 
amount of change AXi depends on Ati] {{XiTAXi)} is 
not from an identical density. Due to an assumption that 
the time-series has a Markov property, {(X^, AX;)} are 
independent each other. 

Secondly, we consider a conditional probability den- 
sity p{Ax\x, 9x), which has a set of parameters O^. The 
parameters 6^ depends on a specific coordinate x, and 
the dependency of 9x on x is unknown in advance. Note 
that the parametrization of the conditional probability 
density is not related to the parametrization of drift and 
diffusion coefficients in the stochastic differential equa- 
tion in Sec. II. Using the localized parameters 6^, the 
drift and diffusion coefficients only for a specific coordi- 
nate X could be estimated. 

Thirdly, we consider the following log-likelihood func- 
tion: 

N-l ^oo 

l{d:,\'D) = Y, / d{Ax,)[\ogp{Ax,\x,9^)] 
i=i J-^ 

xk^{Ax,-AX,)Kw{x-X,). (13) 

Maximizing the above log-likelihood function with re- 
spect to the parameters Ox, it is possible to estimate the 



localized parameter 0^ adequately. 

A remaining task is to specify the conditional prob- 
ability density p{Ax\XtOx)- If a stochastic differential 
equation is linear, the conditional probability density is 
expressed as a normal distribution, i.e., AXi obeys a nor- 
mal distribution with mean Ei and variance Vi , where Ei 
and Vi depend on Xi and Ati ■ For a nonlinear stochastic 
differential equation, the description based on the nor- 
mal distribution is impossible in general. However, if the 
conditional probability density cannot be written as the 
normal distribution, the calculation scheme would be- 
come very complicated. Hence, we here assume that the 
conditional probability density is written as the normal 
distribution. The restriction with the normal distribu- 
tion seems to be severe, but we will discuss a method to 
approximate the nonlinear stochastic differential equa- 
tion to a 'locally' linear stochastic differential equation. 
The assumption of the Gaussian form gives the condi- 
tional probability density. 



\ogp{Axi\x,ex) 



{Ax, -~E,Y 
V, 



log(2^V-) 



(14) 



where Ei and Vi depend on X^, Ai^, and the parameters 
Ox. 



B. Simple estimation 

We here discuss the most simple case, i.e., Ei = 
fj-xo^ti, Vi = (JxgAti, and Bxg = {^xo^ctxo}- This means 
that for the estimation at x = xq, we assume a stochas- 
tic differential equation with constant drift and diffusion 
coefficients. Note that this constant property is assumed 
only for the estimation at x — xq, and we do not as- 
sume that /(x) and ^(x) in Eq. ^ are constant for all 
X. Repeating the estimation of fixo ^nd axo for various 
point X = Xq, we obtain the estimated drift and diffusion 
coefficients as /(x) — fix and g{x) = ax, respectively. 

The above procedure is enough for the estimation, but 
here we give a discussion for the choice of the kernel band- 
width. For simplicity, we here assume that the kernel Ky^ 
is the Gaussian kernel, and that Ati = 1 for all i. In ad- 
dition, we consider a simple case in which X, = xq for 
all i, i.e., all current coordinates are at xq. Hence, the 
kernel function K\y is constant for all i. In this case, the 
following log-likelihood function is obtained: 



liO.ol-D) 



ex 



1 N-l ^oo 



X < ^^"^^ „^""^' + log(2^a^J ) K^Ax, - AX,) 



N-l 



1 ^^^ f {AX, - fi. 



^2 „„2 



+ — +log(27rO (15) 



The maximum likelihood m.ethod gives 

N-~ 
and 



N-l 

^—T AX, 



(16) 



JV-l 



^^(AX,-M.j2+u;^ (17) 



iV- 



Here, note that the unbiased estimator for the variance 
is given as 



N-l 



^^(AX,-Ai.o)' 



iV-2 



(18) 



i=l 



Comparing Eq. (|17p with Eq. (fT8|). it is clear that the 
bandwidth w should be taken small enough for large N. 
In our problem settings in the present paper, it is possible 
to consider that ly ~ 0. Hence, the kernel Km{Axi — 
AXi) is replaced as a Dirac delta function 6{Axi — AXi) 
hereafter. 

Combining the above all discussions, we obtain the fol- 
lowing simple estimation scheme: 

Algorithm 1 (Simple method) 

1. Set a bandwidth W. 

2. For a point xo, maximize the following log- 
likelihood function with respect to 9^^ — 



l{0.o\1^) 



1 "^' ( {AX, - ^i,,At,f 



E 



X Kwix- Xi), 



^LAt, 



+ logi2nalJ 



(19) 



(20) 



(21) 



i.e., calculate the following quantities: 

_ Ef=l^ ^^^Kw{x - X,) 
^^" J:1~i' ^UKwix - X,) ' 
2 _ Et~l\^X^ - fi.oAt,)^Kw{x - Xi)IAt, 

3. Repeat 2 for various Xq. 



4. Estimate the drift and diffusion coefficients as 
f{x) = fi-j; and g{x) = a^. 



C. Local linearization method and 'second-order' 
approximation 

In Sec. IV B, the local drift and diffusion coefficients 
have simple forms, so that we easily solve the stochastic 



differential equation explicitly. Note that if we have non- 
linear drift and diffusion coefficients, we cannot obtain an 
explicit solution for the stochastic differential equation 
exactly in general. However, using a local linearization 
method [25,, J6.] , it is possible to obtain the approximate 
solution with a Gaussian form for the nonlinear stochas- 
tic differential equation. Hence, the estimation scheme 
developed in Sec. IV A is available even for the nonlinear 
cases. We can assume arbitrary drift and diffusion coef- 
ficients, and a 'second-order' approximation, which will 
be introduced soon, is one of tractable schemes. 

We firstly note that the diffusion coefficient in the 
stochastic differential equation must be positive for all 
x; this fact needs an additional constraint for the op- 
timization procedures. We, therefore, use the following 
'second-order' approximation; f{x) and g{x) around a;o 
is approximated as 



Uo(x)^l^+l^(x 



1 



Xo) 



,(2)f 



^l^'.o'i^ 



Xo) 



(22) 



{x)=e^v[s'^^J+.s'-^J{x~xo) + 



(1)^ 



eji- 



Xo) 



(23) 

and 9a:o = {nxd , M^^o : f^xo , si°^ , si^J , si7 } . Due to the ex- 
ponential form in Eq. (1231) , the diffusion coefficient g^^ (x) 
is positive for all x. 

We comment that a final estimation for the drift and 
diffusion coefficients should be performed as f{x) — 
fix , and g{x) — exp(si; ), respectively, as discussed in 
Sec. IV B. 

A stochastic differential equation with the state- 
dependent drift coefficient fxo{x) and the state- 
dependent diffusion coefficient gx^ {x) is nonlinear, and 
the local linearization method gives an analytical solution 
in a Gaussian form. As a result, the conditional proba- 
bility density p{Axi\x, O^) can be written as a Gaussian 
distribution. We briefly explain the local linearization 
method in the Appendix, and only the consequence is 
shown here. We note that the kernel function K^, is re- 
placed with the Dirac delta function according to the 
discussion in Sec. IV B. The estimation scheme based on 
the local linearization method is as follows: 

Algorithm, 2 (LL method) 

1. Set a bandwidth W. 

2. For a point xq, maximize the following log- 
likelihood function with respect to O^a — 
iu^°^ u^'^ u^'^ s^°^ s*'^ s^'H- 

KOxoP) 



N-l 



2 ^ 



V, 



+ log(27ry,) 
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-sZ>-s^J{X,-xo)-^iX,-xoy 
X Kw{x- Xi), 



(24) 



where 



Xi+AXi-xo 



Xi-Xf, 



=(0) _ «(1), 



= (2)„2 



d«exp(-4"^^-4>--4^>^), 
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and 
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(5.o(XO)'(4o^+4'oH^."^o))1, (28) 
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1 - L,Ai,) , (29) 
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3. Repeat 2 for various Xg. 

4. Estimate the drift and diffusion coefficients as 
fix) = fj,x and g{x) = exp{sx )■ 

For step 2, various standard numerical maximization or 
minimization algorithms are available. We note that the 
function c/j^g can be written using an error function or an 
imaginary error function. 



the bandwidth is narrow and the number of data points, 
A'^, is small, the estimation results have large fluctuations, 
as shown in Figs.lDJa) and (b). 

In the estimation based on the local linearization, the 
second-order approximation is used, and then the band- 
width W could be taken larger than that of the kernel 
density estimator; intuitively, it would be reasonable to 
select a bandwidth three or five times as large as the 
optimized bandwidth for the kernel density estimator. 
Larger bandwidth W enables us to use a larger number 
of effective data points for the estimation. Of course, if 
large W is employed, the 'second-order' approximation 
becomes invalid, and hence the estimation results would 
be worse. We here set W = 0.3 ad hoc. Figures lUJc) and 
(d) are the estimated results. As expected, the estima- 
tion results become smooth because a larger number of 
effective data points is available. Due to the simple as- 
sumption for Algorithm 1, the estimated results are not 
good; the usage of the larger bandwidth W gives a kind 
of averaging effects. In contrast, the estimation based 
on the local linearization (Algorithm 2) gives good esti- 
mates. 

The above results are only for one time-series data in 
Fig. [T] Next, we generated 200 time-series data with the 
same parameters, and checked the validity of our pro- 
posed method. Figures lUe) and (f) show results of aver- 
ages of the drift and diffusion coefficients for 200 trajec- 
tories, respectively. Here, we used W = 0.3 for the esti- 
mations. The error bars in Figs.lUJe) and (f) are standard 
deviations of the results. From the results, we see that 
the estimation method based on the local linearization 
(Algorithm 2) works well, especially for the estimation of 
the drift coefficients. 

Based on results on other numerical experiments, we 
comment on the choice of the bandwidth W as follows; 
the choice of the bandwidth W should be small, but if 
the number of data points is not enough, a slightly large 
bandwidth would be better. One may intuitively esti- 
mate the bandwidth from the probability density p{x). 
Although a method of the choice of the bandwidth is be- 
yond the scope of the present paper, the above choice 
would be enough in a practical sense. 



ESTIMATION RESULTS 



VI. CONCLUDING REMARKS 



We apply the two algorithms in Sec. IV to the dis- 
cretely observed data in Sec. II. In the numerical exper- 
iments, we use the Gaussian kernel. 

We should firstly set the kernel bandwidth W. It would 
be possible to select the kernel bandwidth W using some 
criteria, for example, using a cross-validation method. 
However, here we set W heuristically. A simple choice 
for W is the optimized bandwidth for the kernel den- 
sity estimator discussed in Sec. HI. In Figs.HJJa) and (b), 
we show the results obtained from the simple estimation 
(Algorithm 1) and the estimation based on the local lin- 
earization (Algorithm 2), using W = 0.06476. Because 



In the present paper, we developed a nonparametric 
estimation scheme for a stochastic differential equation 
from a discretely-observed time-series data. In order 
to make the estimation scheme, a concept based on a 
kernel density estimation was extended and a kernel- 
ized likelihood function was derived. The word, 'kernel- 
ized', means that an observed point is distributed with 
a bandwidth. In addition, we employ a local lineariza- 
tion method in order to deal with the discrete property 
or sparsity of the observations. A 'second-order' approx- 
imation was introduced, in which the diffusion coefficient 
is restricted to be positive naturally. This avoids adding 
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FIG. 4: Estimation results for the drift coefficients j(x) [panels (a), (c), and (e)], and for the diffusion coefficients g(x) [panels 
(b), (d), and (f)]. Panels (a)-(d) show results for only one trajectory in Fig. [U and panels (e) and (f) are results of averages for 
200 trajectories. The filled circle (Simple) corresponds to the results for the simple Euler scheme, and the empty boxes (LL) 
are those for the local linearization method. In (a) and (b), the bandwidth of the kernel is W = 0.06476, and in (c), (d), (e) 
and (f), W = 0.3. Solid curves in left panels [panels (a), (c), and (e)] and in right panels [panels (b), (d) and (f)] correspond 
to /*'^"°(a;) and g*'^"'^(x) in Eq. ((2|, respectively. Error bars in panels (e) and (f) are the standard deviations. 



any constraint for the maximization or minimization for 
tlic log-likelihood function in the algorithm. Using a 
toy model, we demonstrated that the estimation method 
based on the local linearization method works well. 

Although results are not shown, we applied the estima- 
tion schemes to several different models, and confirmed 
that they work well. In addition, we performed numerical 



experiments for cases with larger Ai^. If we have larger 
Aii, results become worse. This is because that the ap- 
proximation of the local linearization is inadequate for 
the larger time interval. For the large interval cases, the 
Kramers-Moyal coefficients estimation would be available 
[23j . as explained in Sec. I. However, as stated before, 
some additional computational costs are needed. On the 



other hand, our estimation scheme is based on an ap- 
proximated analytical solution of a stochastic differential 
equation, and then the computational time is largely re- 
duced. For example, a computational time for one point 
xq in Algorithm 2 is a few seconds in a laptop computer. 
In this sense, our estimation method is a complementary 
one with previous works. 

Finally, we comment that the estimation scheme for 
stochastic differential equations could be extended to 
multivariate cases straightforwardly, because the local 
linearization method has already been formulated for 
multivariate cases (25j . In addition, estimations in the 
presence of strong measurement noise |37h39{ should be 
considered in future works. 
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Appendix A: Local Linearization method 

The local linearization method [2^ [2y] is one of the 
useful approximations for nonlinear stochastic differen- 
tial equations. The basic concept of the local lineariza- 
tion method is that the nonlinear stochastic differen- 
tial equation is approximated locally as a linear stochas- 
tic differential equation. It would be possible to ob- 
tain the same consequence using a Fokker-Planck equa- 
tion. In Refs. |25l.l26l|. an approximation for the stochas- 
tic differential equation is explicitly given, and for the 
reader's convenience, we briefly review the local lineariza- 
tion method. For details, see Refs. [25ll26j. 

For simplicity, we here consider a one-dimensional 
stochastic process Xt satisfying 



dxt = A{xt)dt + B{xt)dBt, 



(Al) 



where A{xt) is twice continuously differentiable with re- 
spect to Xt, B{xt) is a continuously differentiable function 
of Xt, and Bt is a standard Brownian motion. The above 
stochastic differential equation can be transformed into 
a more tractable equation as follows: 



^(a'^ + —'^\ 






dBt 



(A2) 



where Zt — (t>{xt) and 4>{xt) satisfies an ordinary differen- 
tial equation B-^ = 1. Ito's formula immediately gives 
Eq. (IA2[) . Hence, we here only consider the following 



stochastic differential equation with a constant diffusion 
coefficient: 

dxt ^ A{xt)dt + dBf (A3) 

In the local linearization method, the drift term A{xt) 
is locally approximated by a linear function of Xt . Using 
Ito's formula, we have 



(A4) 



, , Id^A , dA , 



Here, an assumption that both coefficients in Eq. ( IA4 
are constant for a small interval [s,t) gives 



A{xt)-A{xs) 
1 d^A 



2 dx^ 



{t-s) + 



dA 
dx 



{xt - Xs). (A5) 



Hence, we obtain the drift coefficient A{xt) as 
where 



(A6) 



_ dA 

dx 



N,=A{x,,s) 



M, 



1 d'^A 



2 9a;2 



dA 
dx 



Id^A 



2 dx^ 



Finally, we obtain a linear stochastic differential equation 
as follows: 



dxt = (LsXt + Mst + Ns)dt + dBt. 



(A7) 



The linear stochastic differential equation can be solved 
analytically, and the solution is 



Xt =x. 



§ (e^=(*-^) - 1 - L^t - .)) + J' e^'(*-"'di?„, 

(A8) 



where the fourth term follows the Gaussian distribution 
with mean and variance (exp(2Ls(i — s)) — l)/(2is)- As 
a result, xt — Xg follows the Gaussian distribution with 
mean 



A{xs) 



(e^=(*-) - l) + § (e^^(*-^) - 1 - Ut - .)) 



and variance (exp(2Ls(i — s)) — l)/(2Ls). 

Combining the above results, the variable transforma- 
tion Zt = (f'i^t), and its Jacobian, we finally obtain the 
conditional probability used in Eq. (|24|) 
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